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We  show  that  depth  information  of  the  vertices  of  the  triangles  can  be 
obtained  by  using  a  modified  version  of  the  Incremental  Rigidity  Scheme 
devised  by  Ullman  (1984).  In  cases  where  the  motion  of  the  figure  displays 
fundamentally  different  views’  at  each  frame  presentation  the  algorithm  works 
well,  not  only  for  strictly  rigid  motion  (Ullman  1984,  Grzwacz  and  Hildreth 
1985)  but  also  for  a  limited  amount  of  bending  deformation.  We  modify  this 
scheme  to  allow  for  flexing  motion  (in  the  sense  defined  above)  and  call 
our  version  the  Incremental  Semirigidity  Scheme. 


Introduction 


In  order  to  make  the  study  of  visual  perception  and  information  processing 
more  systematic  (and  also  easier),  we  divide  the  multitude  of  visual  informa¬ 
tion  faculties  into  modules  which  are  treated  as  being  more  or  less  indepen¬ 
dent.  Examples  of  these  are  stereo,  motion,  color  and  shape  from  shading. 
This  strategy  has  led  to  some  very  fruitful  results  (Marr  1982). 

All  modules  act  on  a  basic  represention  of  the  image  consisting  of  primi¬ 
tive  elements  which  could  be  perceptually  salient  features  of  the  image,  such 
as  points  or  lines,  or  even  the  intensity  values  themselves.  Low-level  vision 
consists  of  applying  the  different  modules  to  these  primitive  representations 
to  build  up  a  description  of  the  world. 

The  basic  input  to  a  vision  system  is  the  intensity  changes  occuring  on  a 
two-dimensional  image  surface.  The  visual  processing  system  has  to  recover 
the  complete  three-dimensional  description  of  objects  in  space,  from  this  prim¬ 
itive  set  of  data.  Mathematically,  the  visual  data  is  given  in  terms  of  variables 
defined  on  a  two-dimensional  manifold  so  that  it  lacks  the  necessary  informa¬ 
tion  to  reconstruct  the  surfaces  of  objects  embedded  in  the  three-dimensional 
world.  This  fundamental  issue  is  given  a  mathematical  basis  through  the  use 
of  the  regularization  theory  (Poggio  and  Torre  1984).  As  a  consequence  of 
this,  additional  information  has  to  be  furnished  to  the  visual  system,  usu- 
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ally  in  the  form  of  constraints  (which  describe  basic  assumptions  about  the 
three-dimensional  objects).  Examples  of  these  constraints  are  the  rigidity  of 
three-dimensional  objects  (in  visual  motion)  or  surface  smoothness  fin  surface 
interpolation). 

Visual  motion,  as  demonstrated  by  numerous  psychophysical  experiments 
(for  reviews  see  Braddick  1980,  Hildreth  and  Koch  1986),  is  given  in  two 
modes:  one  is  short  range,  so  that  information  is  collected  through  the  use 
of  frames  very  close  in  terms  of  temporal  and  spatial  displacement  and  the 
other  is  long  range.  In  the  present  paper  we  will  deal  mainly  with  the  second 
type  of  motion. 

Depth  from  motion  can  be  recovered  by  seeing  the  object  from  different 
viewpoints.  Psychophysical  experiments  Wallach  and  O’Connell  1953,  Jo¬ 
hansson  1975,  Wertheimer  1912  have  explicitly  shown  that  a  subject  is  able 
to  perceive  the  complete  three-dimensional  form  of  a  moving  object  when 
presented  with  different  views  of  it  (for  both  continuous  and  discrete  presen¬ 
tations). 

The  process  of  finding  the  structure  of  an  object  from  motion  information 
(and  its  depth)  can  be  divided  into  a  three  step  process:(i)  determining  the 
primitives  for  the  two-dimensional  description,  (ii)  making  the  correspondence 
between  these  primitives  and  (iii)  integrating  information  between  the  frames 
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to  get  the  structure. 


The  lack  of  depth  information  (due  to  the  projection  of  the  object  onto 
the  image  plane)  can  be  counterbalanced  by  assuming  some  sort  of  constraint 
for  the  object.  In  the  case  of  rigid  objects  (Ullman  1979)  it  can  be  shown  that 
three  different  views  of  four  non-collinear  (arbitrarily  chosen)  points  on  the 
surface  of  the  object  gives  enough  information  to  recover  the  motion  of  the 
object  and  hence  its  depth  values. 

The  first  stage  of  the  motion  module  is  to  determine  the  correspondence 
between  features  in  different  frames.  These  features  may  include  points,  line 
segments  or  aggregates  of  them.  In  this  paper  we  assume  that  we  know  the 
correspondence  between  features  and  can  track  them  as  they  move  (in  the 
image  plane)  through  the  sucessive  (discrete)  frames.  We  must  now  make  as¬ 
sumptions  about  the  object  in  order  to  recover  its  depth.  A  standard,  though 
strong  assumption,  is  that  the  object  moves  rigidly  in  space  (Ullman  1979). 
We  would  like  to  weaken  this  assumption  to  allow  for  more  general  motion. 
The  Incremental  Rigidity  Scheme  (IRS)  (Ullman  1984)  is  able  to  recover  the 
structure  of  a  rigidly  moving  object  by  assuming  the  minimal  change  in  rigid¬ 
ity  of  this  object  between  frames.  The  IRS  is  also  able  to  deal  with  a  limited 
amount  of  non-rigidity  of  the  object.  In  this  paper  we  show  how  the  ITS  ran 
be  (in  its  modified  version)  extended  to  deal  with  objects  undergo  n_  .on- 


rigid  flexing  motion  preserving  the  Gaussian  curvature.  Koenderink  and  van 
Doom  (1986)  have  studied  motions  of  this  type.  For  clarity  we  refer  to  the 
modified  IRS  as  the  ISRS  (Incremental  Semi-rigidity  Scheme). 

The  non-rigid  flexing  motion  we  consider  corresponds  to  rigid  triangles 
bending  relative  to  each  other.  We  will  show  in  the  next  section  that  this  mo¬ 
tion  corresponds  to  non-rigid  motion  which  preserves  the  Gaussian  curvature. 
For  example,  motion  of  this  type  would  allow  a  sheet  of  paper  to  be  deformed 
into  a  cylinder. 

The  basic  idea  of  the  IRS  is  to  contruct  a  internal  model  of  the  object, 
which  is  initially  choosen  to  be  flat,  and  to  update  this  model  assuming  min¬ 
imal  change  of  rigidity  between  consecutive  image  frames.  This  change  in 
rigidity  is  measured  by  the  changes  in  distance  between  different  points  of  the 
object.  Each  new  frame  yields  more  information  about  the  object  and  the 
scheme  converges  to  a  fixed  model.  For  rigid  motion  this  gives  good  results 
(Ullman  1984).  Our  modification  of  the  algorithm  requires  minimal  change 
of  rigidity  only  for  points  which  lie  at  adjoining  vertices  of  the  triangulation. 
This  is  a  weaker  assumption  than  global  rigidity  and,  as  we  shall  show,  in  some 
cases  allows  the  recovery  of  structure  of  objects  undergoing  highly  non-rigid 
motion. 

We  work  specifically  with  two  types  of  figures  built  up  of  triangles,  al- 


lowing  bending  deformations  to  take  place.  The  first  figure  is  made  out  of 
two  triangles  with  a  common  edge  which  constitutes  the  axis  of  bending.  To 
simulate  the  non-rigid  motion  we  use  a  two  step  proceedure.  First  we  rotate 
the  whole  figure  as  a  rigid  object  and  then  we  bend  one  triangle  with  respect 
to  the  other  (over  their  common  edge).  The  second  figure  consists  of  six  trian¬ 
gles  with  adjacent  common  edges.  Its  non-rigid  motion,  modulo  global  rigid 
rotation,  is  similar  to  the  folding  (and  unfolding)  of  a  umbrella.  Note  that  for 
both  these  examples  the  triangles  are  not  deformed  so  the  Gaussian  curvature 
remains  unchanged. 

As  an  intermediate  step  we  applied  the  ISRS  to  the  rigid  motion  of  a 
single  triangle  and  of  six  triangles  with  adjacent  (common)  edges.  In  both 
cases,  as  expected,  the  algorithm  works  well.  Finally,  we  studied  the  six 
triangle  figure  including  a  small  deformation  of  one  of  the  base  edges,  keeping 
the  others  fixed,  which  corresponds  to  a  deformation  which  changes  the  global 
curvature. 


This  article  is  organized  as  followsun  chapter  2  we  give  a  general  overview 
of  the  triangulation  method,  which  is  known  to  physicists  as  ’’Regge  calculus”. 
In  chapter  3  the  Incremental  Rigidity  Scheme,  and  the  ISRS,  is  presented  and 


Finally  in  chapter  6  we  draw  conclusions  and  indicate  future  research. 
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Regge  calculus 

The  basic  idea  of  the  Regge  calculus  (Regge  1961)  is  to  approximate  a  gen¬ 
eral  surface  by  a  polyhedron  built  up  of  triangles.  We  recover  the  structure 
of  the  general  surface  as  we  send  the  number  of  triangles  to  infinity,  while 
maintaining  the  total  area  of  the  polyhedron  fixed. 


(a) 


(b) 


Figure  1  A  general  triangulation 

Suppose  that  we  construct  a  curved  triangle  on  the  surface  of  a  sphere,  where 
the  edges  are  geodesics  (a  geodesic  is  the  line  of  shortest  length  between  two 
points).  The  sum  of  the  internal  angles  of  this  triangle  is  different  from  tt, 
because  the  surface  is  curved  and  its  curvature  is  given  by  the  different  with 
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respect  to  n  of  the  sum  of  the  internal  angles.  By  using  a  collection  of  these 


curved  triangles  we  can  reconstruct  the  surface  of  the  sphere.  This  means 


that,  at  any  point  of  the  sphere,  the  gaussian  curvature  (and  the  principal 


curvatures),  using  curved  triangles,  is  always  the  same  as  the  curvature  of  the 


sphere. 


On  the  other  hand,  the  Regge  calculus  builds  up  a  net  of  flat  triangles 


which  only  approximate  the  shape  of  the  initial  surface  to  a  given  precision 


(which  in  turn  depends  on  the  scale  of  triangulation).  The  triangles  used  in 


the  triangulation  method  of  the  Regge  calculus  have  straight  edges  so  that 


the  curvature  content  lies  exclusively  at  the  vertices  (the  intersection  points 


of  edges  of  different  triangles).  Let  now  us  suppose  that  we  approximate  the 


surface  of  the  sphere  by  a  net  of  triangles  (with  straight  edg'» s).  Also,  let  us 


concentrate  on  an  arbitrary  vertex  (intersection  of  edges)  of  this  triangulation. 


If  we  take  the  difference  between  2n  and  the  sum  of  the  angles  (adjacent  to 


this  vertex)  we  get  the  deficit  angle  which  measures  the  local  curvature  (at 


the  location  of  the  vertex)  of  the  (triangulated)  surface. 


Let  us  take  a  particular  vertex  a  so  that  the  angles  of  its  adjacent  triangles 


(denoted  by  r)  are  represented  by  0£.  The  deficit  angle  6a  is  defined  by  the 


following  expression 


Figure  2  The  deficit  angle 


6a  =  27T  -  J2 

r 

The  total  curvature  of  this  triangulated  surface  is  given  by  the  sum  of  all 
deficit  angles.  This  means,  if  R  is  the  total  curvature,  then 


*  =  2> 

Or 


As  a  consequence  of  this,  the  curvature  of  a  general  surface,  which  must 
be  calculated  locally,  can  be  approximated  by  the  sum  of  the  deficit  angles 


for  an  arbitrary  triangulation  of  it. 


In  general,  the  curvature  of  an  arbitrary  surface  is  given  by  the  Euler 


number  \  which  describes  the  topological  content  of  the  surface,  and  is  given 
by 


X  =  2  -  T) 

where  r?  is  the  surface  genus.  Let  us,  as  a  illustration,  think  of  the  surface 
of  a  sphere  which  is  approximated  by  a  net  of  triangles.  By  using  the  Euler 
formula  we  can  write  the  surface  genus  as 

rj  =  2  +  e-  v-  f 

where  e,  v  and  /  are  respectively  the  number  of  edges,  vertices  and  faces  of 
the  triangulation  net.  In  order  to  create  a  hole  we  have  to  eliminate  a  entire 
face  from  this  surface,  and  as  a  consequence,  an  equal  number  of  (three)  edges 
and  vertices.  By  simple  inspection  of  equation  (4)  one  can  conclude  that  by 
creating  a  hole  on  the  (triangulated)  surface  the  surface  genus  is  reduced  by 
one  unit,  and,  as  a  consequence  of  formula  (3),  the  Euler  number  is  reduced 
by  an  equal  amount.  If,  on  the  other  hand,  we  want  to  create  a  handle  out  of 
the  original  surface,  we  have  to  eliminate  two  faces  and  identify  the  perimeters 
(constructed  from  the  edges  bordering  the  holes).  This  means,  by  analogy  to 


the  creation  of  a  hole,  that  the  surface  genus  (and  also  the  Euler  number)  is 
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reduced  by  two  unities.  For  the  general  case,  if  we  create  a  number  of  B  holes 
and  H  handles  then  the  surface  genus  is  given  by  t?  =  B  4-  2 H,  and  the  Euler 
number  by 


X  =  1  -  B  -  2H 

The  relationship  between  the  curvature  (R)  and  the  surface  genus  t/  is 
given  by  the  Gauss-Bonnet  curvature  theorem  (for  general  compact  polyhe¬ 
drons).  It  can  be  writen  as 


N 

R  =  ^  Si  =  2*  ( 1  -  rj) 

i  —  i 

So,  for  example,  when  we  want  to  know  the  total  curvature  of  a  sphere, 
which  can  be  done  by  summing  over  all  the  deficit  angles,  we  just  have  to 
observe  that  its  genus  is  zero  (it  has  no  holes  or  handles)  and  as  result  of  this 
we  obtain  47 r.  In  the  case  of  a  torus,  whose  genus  is  2  (one  handle)  the  total 
curvature  is  zero. 

We  can  describe  the  Regge  calculus  by  the  following  block  diagram  which 
describes  a  simple  algorithm  to  calculate  the  curvature  of  a  triangulated  sur¬ 
face. 

What  happens  if  the  size  of  all  triangles  in  the  triangul.it  ion  net  goes 
to  zero  at  the  same  rate  as  their  numbers  increase?  W e  recover  the  original 


Figure  3  Regge  calculus  block  diagram 
surface  whose  gaussian  curvature  is  given  by 


R  = 


I 


<PxK 


where  TZ  is  the  local  (gaussian)  curvature. 

The  idea  of  the  Regge  calculus  can  be  generalized  to  a  n  dimensional 
manifold  where  it  approximates  a  smoothly  curved  n-dimensional  Riemannian 
manifold  by  a  collection  of  n-dimensional  elements  without  any  curvature 
contend  (like  the  triangles  in  2-dimensions)  joined  by  (n  -  2)-dimensional 
elements  ( points  in  2-dimensions)  where  the  curvature  content  is  concentrated. 


It  is  certainly  easier  to  calculate  the  curvature  using  a  triangulation  net, 
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since  we  simply  have  to  sum  over  all  deficit  angles,  rather  than  taking  the 
continuum  limit,  in  which  case  we  have  to  calculate  the  curvature  of  the 
surface  at  each  point  (which  is  very  sensitive  to  noise). 

Our  basic  idea  is  to  represent  objects  by  the  position  of  the  set  of  points 
corresponding  to  a  triangulation  of  the  surface.  We  can  track  the  positions 
of  this  set  of  points  through  sucessive  time  frames.  It  will  then  be  possible 
to  determine  the  curvature  changes  in  time  very  simply.  More  importantly 
this  representation  enables  us  to  relax  the  rigidity  assumption  and  allow  for 
a  class  of  non-rigid  motion.  Suppose  the  triangles  are  kept  fixed  in  size  but 
are  allowed  to  flex  relative  to  each  other.  Koenderink  and  Van  Doorn  (1986) 
show  that  although  this  kind  of  motion  is  non-rigid  it  nonetheless  is  suffi¬ 
ciently  constrained  to  allow  the  structure  to  be  recovered.  We  show  that  it 
is  straightforward  to  adapt  the  incremental  rigidity  scheme  to  deal  with  this 
type  of  motion.  Thus  our  method  can  be  thought  of  as  a  type  of  incremental 
semi-rigidity  scheme. 

This  semi-rigid  motion  can  be  easily  analysed  in  terms  of  Regge  Calculus. 
Since  the  triangles  are  fixed,  the  deficit  angles  at  the  vertices  axe  constant. 
Thus  the  Gaussian  curvature  does  not  change  during  the  motion.  Recall  that 
the  Gaussian  curvature  is  the  product  of  the  two  principal  curvature  of  the 
object.  So,  for  example,  this  motion  will  allow  a  cylinder  to  h<  <  i.msformed 
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into  a  plane  (both  have  zero  Gaussian  curvature),  but  not  into  a  sphere. 

Incremental  Rigidity 

We  assume  that  we  have  determined  the  correspondence  between  the  vertices 
of  a  triangulation  of  an  arbitrary  object  and  want  to  determine  its  structure. 
Thus  we  assume  the  correspondence  problem  and  the  triangulation  problem 
are  solved  and  concentrate  directly  upon  determining  structure.  It  could, 
however,  be  possible  to  solve  the  correspondence  problem  at  the  same  time 
as  the  structure  is  obtained. 

Human  visual  motion  is  measured  in  (at  least)  two  modes,  one  the  short- 
range  mode  deals  with  space-time  information  processed  discretely  but  with 
a  high  sampling  rate  while  the  other  one,  the  long-range  mode  exibits  a  low 
sampling  rate.  The  long-range  mode  has  a  ISI  (interstimulus  interval-the 
temporal  interval  between  two  samplings)  of  at  least  300ms  in  contrast  to 
the  short-range  mode  with  a  ISI  less  than  80-1 00ms  (Braddick  1980,  for  a 
discussion  see  Marr  1982).  Although  these  two  modes  act  independently  in 
the  initial  stages  of  visual  motion  processing  (Braddick  1980),  it  is  supposed 
that  at  later  stages  they  have  some  kind  of  interaction  (Clatworthy  and  Frisby 
1973). 

We  will  assume  the  analog  of  the  long-term  mode,  which  leads  to  the 
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perception  of  apparent  motion.  The  scheme  that  we  adopt  for  determining 
structure  is  based  on  the  Incremental  Rigidity  Scheme  (Ullman  1984)  and 
consists  of  updating  an  internal  model  of  the  structure  of  the  object.  More 
precisely,  we  initially  assume  the  object  is  flat  and  update  the  model  between 
time  frames  by  assuming  minimal  change  in  rigidity  (or  semi-rigidity)  between 
frames.  This  minimal  change  is  enforced  by  minimizing  a  cost  function,  thus 
yielding  a  series  of  estimated  depth  values,  which  gradually  converge  to  the 
correct  result. 

The  IRS  can  be  described  as  follows.  Suppose  there  are  P  points.  Ini¬ 
tially  the  model  assumes  that  the  depth  values  are  zero  for  all  points.  Let  us 
suppose  that  for  the  IVth  frame  we  have  a  model  M(N)  which  describes  a  con¬ 
figuration  of  these  points  in  terms  of  its  X,  Y  and  Z  coordinates.  The  X  and 
Y  coordinates  are  measured  from  their  two-dimensional  projection  onto  the 
image  plane  (we  assume  orthographic  projection)  while  the  Z  (depth  values) 
cooordinates  have  to  be  estimated.  We  define  L^}  as  the  squared  distance 
between  points  i  and  j,  for  the  iVth  frame,  so  that 


=  (*,"  -  x?y  +  (K"  -  y"r  +  (zr  -  z?y. 
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and  the  sum  of  L^}  over  all  (P)  points,  we  define  as  £A/",  that  is. 


•= i ;= i 


Now  we  go  to  the  next  frame  and  calculate  the  same  quantity  Z.,V;+1 ,  which 


is  also  a  function  of  the  unknown  (new)  depth  values  {Z^}.  The  difference 
between  £v+1  and  £v  is  a  measure  in  the  change  of  the  rigidity  of  the  internal 
model.  To  calculate  the  new  depth  values  {Z/j}  we  minimize  a  cost  function 
qN+i,  n ,  defined  ^  tfie  square  of  the  difference  between  £  v+1  and  £‘v.  Thus, 
we  minimize 


gN  +  l,  N 


=  [£n+1  -  CN  ]2 


with  respect  to  { Z‘V+ 1 } .  Note  that  the  {Z^}}  are  known  (they  have  been 
obtained  by  an  analogous  process  for  the  Nth  frame).  Having  calculated  the 
new  depth  values,  we  move  to  the  (N  +  2)th  frame  and  do  the  same  compu¬ 
tation  for  {Z^f2},  and  so  on  until  we  obtain  the  correct  depth  values  (which 
correspond  to  a  global  minimum  of  the  cost  function).  The  minimization 
process  gives  us  a  third  degree  polynomial  equation  in  the  Z/j’s. 


At  each  step  of  the  (internal)  iteration  proceedure,  the  depth  values 
{Z-^j}  are  initial  inputs  for  the  next  step.  So,  as  the  number  of  iterations 
increases  the  correct  values  for  the  depth  of  the  P  points  are  approached.  In 
this  process  CN+1,N  jS)  in  general,  different  from  zero  which  means  that  dur¬ 
ing  the  updating  procedure,  the  change  of  structure  (according  by  the  internal 
model)  is  non-rigid. 


To  allow  semi-rigid  motion  (as  described  at  the  end  of  the  previous  sec 


tion)  we  modify  the  IRS  to  the  ISRS  (Incremental  Semi-Rigidity  Scheme). 
The  IRS  minimizes  the  cost  function  CN  +  1,  N,  given  by  equation  (10),  which 
basically  measures  the  change  in  the  (global)  rigidity  of  the  internal  model, 
between  the  two  frames.  However,  if  we  restrict  the  summation  in  the  expres¬ 
sion  of  CN  given  by  equation  (9)  to  be  only  between  points  which  are  vertices 
of  the  same  triangles,  then  the  difference  between  £N+1  and  CN  will  depend 
only  on  the  sum  over  these  vertices  and  we  obtain  the  ISRS.  We  can  rewrite 
equation  (9)  as 

cN  =  £ 

«.  j 

where  the  summation  is  only  taken  over  vertices  i,j  of  the  same  triangles. 
So  that  the  new  CN+ l,N  defined  through  (11)  will  now  be  a  measure  of  the 
semi-rigidity  of  the  structure. 

The  ISRS  updates  its  internal  model  by  minimizing  (11)  with  respect  to 
{ Z [*}  thus  enforcing  only  a  local  rigidity.  In  this  sense,  the  ISRS  has  more 
flexibility  to  deal  with  object  non-rigidity  than  does  the  IRS. 

We  should  be  carefull  not  to  confuse  the  non-rigidity  of  the  (three- 
dimensional)  object  with  that  of  the  internal  model.  Even  if  the  object  is 
rigid,  the  internal  model  will  change  in  a  non-rigid  manner  until  it  converges 


to  the  structure  of  the  real  object. 
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Specific  computations 


In  the  last  two  section  we  described  the  Regge  calculus  and  the  ISRS.  Now 
we  want  to  describe  some  specific  computations  that  we  did  with  different 
configurations  built  up  of  triangles. 

Our  general  idea  is  to  work  with  an  arbitrarily  triangulated  surface  in 
motion,  allowing  for  bending  deformations  to  take  place.  Initially  we  tested 
the  ISRS  for  a  rotating  triangle.  Afterwards  we  applied  the  same  method  to 
two  adjacent  triangles  rotating  and  flexing  over  their  common  edge.  Finally, 
we  considered  six  triangles  forming  a  closed  sequence  of  adjacent  elements 
which  have  three  kinds  of  motion:  (1)  global  rotation  (each  triangle  rotates 
by  the  same  amount),  (2)  an  umbrella  type  of  motion  (where  the  six  points  on 
the  perimeter  have  a  oscilatory  semirigid  motion  representing  the  closing  or 
opening  of  a  umbrella)  and  (3)  motion  with  one  edge  on  the  perimeter  chang¬ 
ing  its  length  by  an  oscilatory  movement,  thereby  changing  the  curvature  of 
the  object. 

For  the  single  triangle  rotating  rigidly  we  obtained  results  similar  to  those 
of  previous  studies  (UUman  1984,  Grzywacz  and  Hildreth  1985).  We  did  not, 
however,  observe  any  optimal  angle  of  rotation  under  which  the  system  best 


recovers  structure. 


We  then  proceeded  to  the  two-triangle  case.  We  gave  this  figure  a  combi- 


nation  of  two  kinds  of  motion,  a  global  rotation  around  a  fixed  axis  followed 
by  a  bending  deformation  over  the  commor  edge.  The  X  (horizontal)  and 

Y  (vertical)  cartesian  coordinates  parameterize  the  image  plane,  while  the  Z 
coordinate  represents  the  (three-dimensional)  depth  value.  We  experimented 
with  varying  the  axis  of  (global)  rotation,  starting  with  it  pointing  along  the 

Y  axis  and  then  rotating  it  towards  the  Z  axis  by  increments  of  30  degrees.  We 
also  varied  the  amount  of  bending  and  rotation  between  times  frames.  The 
rotation  varied  from  10  to  60  degrees  and  the  bending  from  5  to  30  degrees. 


Figure  4  View*  of  the  two-triangle  with  different  bending  angle * 

When  the  global  rotation  axis  is  at  an  angle  of  30  or  45  degrees  from  the  image 

plane  (which  means  60  or  45  degrees,  respectively,  to  the  Z  axis)  the  results  of 

the  ISRS,  the  computed  depth  values,  agree  very  well  with  the  real  values  of 
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the  simulation.  On  the  other  hand,  for  angles  of  60  degrees  (30  degrees  with 
the  Z  axis),  and  especially  90  degrees  (parallel  to  the  Z  axis),  the  recovery  of 
structure  from  motion  is  not  good.  In  the  specific  case  of  90  degrees  there  is 
almost  no  information  gained,  since  only  part  of  the  figure  is  visible  in  the 
sequence  of  frames.  We  also  observed  two  well  defined  limits  for  the  angle  of 
(global)  rotation  in  each  time  frame.  It  has  a  lower  bound  of  about  5  degrees 
and  an  upper  bound  of  about  90  degrees.  We  obtained  the  best  results  in 
the  range  between  30  and  60  degrees.  It  seemed  that  if  these  angles  were  too 
small  or  too  large  not  enough  information  was  available  for  the  algorithm  to 
use.  Intuitively,  if  the  global  rotation  is  too  small,  recovery  of  the  structure 
is  unstable  as  too  little  new  information  is  added  between  time  frames.  If 
the  rotation  is  too  large  the  new  information  is  too  different  from  the  old  one 
(you  see  the  figure  from  a  totally  new  point  of  view),  so  that  the  ISRS  cannot 
build  up  a  uniform  model  of  the  figure.  This  general  properties  of  the  IRS 
(and  ISRS)  were  discussed  in  detail  by  Grzywacz  and  Hildreth  (1985),  where 
it  is  argued  that  if  the  size  of  the  incremental  rotation  angles  decreases  then 
the  deterioration  of  information  is  nversely  proportional  to  the  number  of 
frames. 

The  best  values  of  the  bending  angle  (between  each  time  frame)  lay  in  the 
range  of  10  to  30  degrees.  Thus,  since  the  bendings  are  made  after  the  global 
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Figure  5  Error  graph  for  the  two-triangles.  The  vertical  axis  shows  the 
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error  function  measure  of  the  difference  between  the  model  and  the  stim¬ 
ulus.  (a)  and  (b)  have  bending  angles  10  and  30  degrees  respectively.  The 
rotation  in  each  time  frame  is  either  5  or  30  degrees. 

rotation  of  the  figure,  too  large  a  bending  angle  can  affect  the  robustness  of 


the  algorithm.  So  only  limited  amounts  of  flexing  are  allowed  between  time 
frames.  Typically  we  did  only  one  bending  between  each  (glob.«n  rotation. 
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Next  we  considered  six  triangles  performing  a  global  rigid  rotation  around 
a  fixed  axis.  Basically,  the  six-triangle  figure  consists  of  six  adjacent  triangles 
which  all  share  a  common  vertex  and  any  two  adjacent  triangles  have  always 


a  common  edge.  The  (global)  rotation  of  this  six-triangle  is  done  with  respect 
to  a  axis  passing  through  the  common  vertex,  and  maintains  all  triangle's  with 
in  a  fixed  (rigid)  structure.  The  results  we  obtained  axe  identical,  in  nature, 
to  the  ones  obtained  for  the  two  triangles.  More  precisely  if  the  orientation  of 
the  axis  of  (global)  rotation,  with  respect  to  the  image  plane,  is  between  30 
and  60  degrees,  and  the  rotation  angle  (between  time  frames)  is  between  30 
and  60  degrees  then  the  ISRS  algorithm  is  very  efficient  in  recovering  structure 
from  motion. 

For  the  next  set  of  simulations  we  consider  the  six-triangle  figure  to  sim¬ 
ulate  an  ’’umbrella”  type  of  motion.  The  ’’umbrella”  type  of  motion  consists 
in  having  three  of  the  perimeter  points  (choosen  in  alternation)  perform  a 
oscillatory  motion  rather  like  the  spokes  of  an  umbrella  being  opened  and 
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closed.  The  motion  of  the  three  remaining  points  is  determined  uniquely  by 
requiring  the  triangles  to  be  rigid.  This  motion  is  illustrated  in  figure  6. 

The  umbrella  type  of  motion  can  be  defined  in  the  following  way.  Let  each 
point  on  the  perimeter  of  the  umbrella  be  described  by  a  vector  whose  origin 


lies  at  a  fixed  point  (the  common  vertex  we  introduced  before)  in  space.  We 


(o) 


(b) 


Figure  6  ” Umbrella”  motion 


also  define  an  axis  with  a  specific  orientation  in  the  (thx  ■e-dimensional)  space- 


(the  handle  of  the  umbrella).  Of  these  six  points  only  three  can  move  freely 


in  space  so  that  the  remaining  points  axe  constrained  by  the  motion  of  the 


former  ones.  This  can  be  easily  understood  by  observing  that  the  vectors  of 


the  constrained  points  can  be  expressed  in  terms  of  its  adjacent  unconstrained 


points  by  the  following  formula 


Ri  =  +  uRi+i  +  vRi-i  A  R, 


where  R,  represents  the  unit  vector  defining  the  direction  of  the  *th  point  (on 


the  perimeter)  in  respect  to  the  origin  and  ”A”  is  the  cross  product.  Using 


simple  algebraic  manipulations,  it  can  be  shown  that  the  parameters  A,  n  and 


v  are.  respectively,  given  by 


A  =  cos  Q;-i, , 

-  Ri- 1 

■  /?,+  !  cos  O,  ,  +  \dct 

V  =  cos  a,,1+1 

-  R,- 1 

■  i?,+  i  cos  q,_!  ,det 

and 

v  =  ±  \J\  —  A2  —  fj.2  —  ■  R,+  ldet 

where 

det  =  l  ~  (/?,_!  ■  /?,+  i)2 

The  symbol  ”•••”  represents  the  dot  product  and  a,  1+1  represents  the 
angle  between  vectors  R,  and  R,+  [.  Notice  that  v  is  determined  up  to  a  sign. 

In  this  way,  we  label  the  six  vertices  of  the  perimeter  of  the  umbrella 
from  1  to  6.  and  allow  points  1,  3  and  5  to  move  independently  (in  a  glob¬ 
ally  coherent  way  consistent  with  the  triangles  being  rigid),  with  points  2, 
4  and  6  satisfying  the  contraint  (12).  For  the  ’’umbrella”  motion  the  three 
unconstrained  (perimeter)  points  move  by  discrete  changes  of  the  polar  angles 
6,  (the  angles  between  the  vectors  joining  the  points  to  the  center  and  the 
handle  of  the  umbrella).  The  value  of  these  polar  angles  was  constrained  to 
lie  between  60  and  120  degree.  We  varied  these  angles  by  different  increments 
varying  from  5  to  15  degrees. 

For  this  stimuli  it  was  more  difficult  to  obtain  a  correct  answer  to  the 


depth  values.  This  showed  itself  in  the  difficulty  of  obtaining  the  global  min- 


Figure  7  Motion  of  one  vertex 


ima  of  the  cost  function  between  frames.  The  system  often  got  trapped  in 
local  minima  during  gradient  descent.  Some  of  these  local  minima  had  simple 
interpretations.  For  example  some  corresponded  to  a  depth  reversal  of  part 
of  the  structure  (of  course  depth  reversal  of  the  whole  structure  is  a  possible 
ambiguity  when  orthographic  projection  is  used).  These  particular  minima 
could  be  removed  by  simple  heuristics;  one  could  find  the  endpoint  of  a  min- 
imization,  flip  the  sign  of  a  depth  value  and  see  if  this  reduced  the  energy. 
If  it  did,  gradient  descent  could  be  restarted  from  this  configuration.  Not  all 
minima,  however,  could  be  removed  in  this  way.  Even  if  the  global  minima 
was  always  found,  which  we  could  do  by  an  interactive  algorithm,  the  ISRS 


Figure  8  Side  views  of  the  ” umbrella ” 

would  not  always  converge  to  the  correct  result.  Moreover,  it  would  some¬ 
times  approach  the  right  result  and  then  diverge  from  it.  Thus  it  seemed  to 
display  a  number  of  the  pathologies  of  the  IRS  (Grzywacz  and  Hildreth  1985). 

We  simulated  the  motion  on  the  screen  of  a  Symbolics  LISP  machine 
(with  the  help  of  V.  Inada).  Informal  psychophysics  suggested  that  humans 
also  have  difficulty  estimating  the  depth  for  these  stimuli,  although  they  usu¬ 
ally  got  the  correct  qualitative  result.  This  suggested  testing  how  good  the 
ISRS  results  were  qualitatively.  To  do  this  we  also  displayed  the  images  and 
the  models  resulting  from  the  IRS  from  different  viewpoints.  If  the  viewpoint 
was  the  same  as  for  the  simulation  then  naturally  the  projections  of  the  im- 
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Figure  9  The  results  for  the  umbrella,  (a)  and  (b)  show  the  convergence 
without  and  with  depth  reversals.  The  vertical  axts  shows  the  error  fun c- 
tion  measure  between  the  stimulus  and  the  model 


ages  and  the  models  was  identical.  By  altering  the  viewpoint  from  this  initial 
direction  we  could  obtain  a  qualitative  measure  of  the  similarity  between  the 
image  and  the  model.  We  found  that,  provided  the  axis  of  the  umbrella  is 
within  about  30  degrees  of  the  normal  to  the  image  plane  and  provided  the 
result  is  viewed  from  a  direction  less  than  30  degrees  away  from  the  noimal  to 
the  image  plane  then  the  motion  looks  very  similar  to  the  simulated  motion. 

The  ” umbrella”  motion  is  the  most  complicated  one  which  has  been  sim¬ 
ulated  by  either  the  IRS  or  the  ISRS.  We  applied  the  IRS  to  some  of  the 
individual  triangles  of  the  umbrella  and  obtained  poor  results  (worse  than  the 
estimates  of  the  ISRS).  These  triangles  move  so  that  in  some  configurations 
their  projected  area  is  practically  zero  (see  figure  6).  We  did  not  apply  the 
IRS  to  the  entire  display,  arguing  that  the  large  amount  of  rigidity  of  the 
entire  structure  would  violate  the  assumptions  of  the  IRS  and  prevent  it  from 
giving  the  correct  answer.  We  know  of  no  simulations  of  the  IRS  which  work 
well  under  these  conditions.  This  suggests  that  it  is  a  better  strategy  to  try 
an  ISRS  over  the  whole  object  than  to  apply  the  IRS  to  each  part  of  the 
object  seperately.  The  global  effect  of  the  ISRS  creates  a  form  of  cooperation 
between  parts  of  the  object  which  might  be  badly  estimated  otherwise. 

Finally  we  varied  one  edge  of  the  perimeter  of  the  six  triangles  with  the 
others  maintained  constant.  This  caused  a  change  of  the  Gaussian  curvature 


of  the  object.  For  small  variations  the  results  were  good,  but  we  did  not  do 
extensive  experiments. 

Conclusions 

We  showed  that  it  is  possible  to  recover  structure  from  motion  for  figures 
constructed  of  triangles  which  are  allowed  to  perform  non-rigid  motion  with 
bending  deformations.  The  basic  idea  is  to  segment  the  surface  using  Regge 
calculus,  approximating  it  by  a  net  of  triangles  with  the  curvature  given  by 
the  sum  of  the  deficit  angles,  and  to  recover  structure  from  motion  in  terms 
of  the  Incremental  Semirigidity  Scheme.  The  vertices  of  the  triangulation  are 
used  in  the  ISRS  to  obtain  structure  from  motion. 

The  case  of  two  triangles  performing  rigid  global  rotation  followed  by  a 
local  bending  deformation  shows  that  the  ISRS  algorithm  is  good  when  the 
axis  of  global  rotation  is  close  to  the  parallel  position  with  respect  to  the  image 
plane  (parallel  to  the  Y  axis)  and  the  angle  of  rotation  lies  in  the  range  between 
30  and  60  degrees.  However,  if  this  axis  is  close  to  being  perpendicular  to  the 
image  plane  (parallel  to  the  X  axis),  the  algorithm  shows  poor  performance. 
In  addition,  the  bending  angle  has  to  be  small  (in  the  range  between  10  and 
30  degrees). 

If  we  increase  the  number  of  triangles  and  the  motion  remains  globally 


rigid,  then  the  recovery  of  structure  from  motion  is  best  if  the  orientation  of 
the  rotation  axis  is  close  to  parallel  with  respect  to  the  image  plane,  as  in  the 
case  of  the  two  triangles.  This  means  that  increasing  the  number  of  points 
for  this  type  of  motion  does  not  affect  algorithmic  robustness  of  the  ISRS.  On 
the  other  hand,  if  the  six-triangle  figures  is  allowed  to  perform  the  "umbrella" 
type  of  motion,  then  unless  the  position  of  the  axis  passing  through  the  center 
of  the  common  vertex  is  nearly  parallel  to  the  normal  to  the  image  plane, 
the  algorithm  does  not  perform  well.  Some  informal  psychophysics  suggests 
that  humans  have  some  difficulty  in  correctly  estimating  the  depth,  but  get 
the  correct  qualitative  motion.  The  algorithm  also  often  seemed  to  get  the 
correct  qualitative  motion. 

We  have  not  yet  discussed  which  points  or  elements  on  the  surface  are 
choosen  to  be  the  vertices  of  the  triangulation  scheme,  nor  how  the  correspon¬ 
dence  between  these  points,  for  sucessive  time  frames,  is  done.  Ullman  (1984) 
suggested  using  features  (detected  by  a  suitable  operator).  For  example,  Hil¬ 
dreth  (private  communication)  has  used  the  texture  features  of  a  cup  as  input 
to  the  IRS.  Detectable  features  of  this  type  seem  a  natural  choice  for  the  ver¬ 
tices  of  the  triangulation.  It  would  be  simple  to  adapt  the  IRS  further  to  deal 
with  objects  made  of  rigidly  moving  subparts,  for  example  rectangles,  flexing 
at  their  joints.  An  extended  model  of  this  type  might  be  able  to  explain  Jo- 


hansson’s  (1975)  results  for  moving  figures.  In  these  experiments  light  sources 
were  attached  to  the  joints  of  moving  figures  and  the  correct  (non-rigid)  mo¬ 
tion  was  retrieved.  We  should  note  that  in  this  case  the  positions  of  the  light 
sources  suggest  natural  places  to  segment  the  object.  The  correspondence 
between  sucessive  time  frames  could  be  done  by  tracking,  or  by  a  minimal 
mapping  scheme  (1984).  An  interesting  point  is  that  the  IRS  depends  only  on 
the  positions  of  the  points  and  not  on  the  lines  connecting  them.  We  did  some 
informal  psychophysics  on  the  ’’umbrella”  motion  changing  the  triangulation 
by  altering  which  points  were  connected  by  which  lines  (without  changing  the 
total  number  of  points).  These  changes  sometimes  altered  the  depth  percep¬ 
tion  of  the  object,  in  contrast  to  what  the  IRS  would  predict.  These  effects 
were  only  preliminary  and  need  to  be  studied  more  systematically.  However 
they  suggest  that  the  choice  of  triangulation  is  important. 

We  conclude  from  this  that  although  the  ISRS  (and  IRS)  is  good  for  a 
certain  range  of  motions  it  is  unable,  at  least  without  modifications,  to  cope 
with  all  motions.  More  studies  are  needed  to  check  for  which  motions  these 
schemes  are  effective.  The  IRS  has  mostly  been  demonstrated  on  constant 
rigid  rotation  about  an  axis  and  needs  to  be  tested  for  a  larger  class  of  motions. 
We  argue  that  the  ISRS  may  be  more  effective  than  the  IRS  for  non-rigid 
flexing  motion  since  it  has  greater  flexibility.  Grzywacz  and  Hildreth  have 


suggested  modifications  to  the  basic  IRS  and  report  better  results  (private 


communication). 
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